Bringing contextual information to google speech recognition

نویسندگان

Petar S. Aleksic

Mohammadreza Ghodsi

Assaf Hurwitz Michaely

Cyril Allauzen

Keith B. Hall

Brian Roark

David Rybach

Pedro J. Moreno

چکیده

In automatic speech recognition on mobile devices, very often what a user says strongly depends on the particular context he or she is in. The n-grams relevant to the context are often not known in advance. The context can depend on, for example, particular dialog state, options presented to the user, conversation topic, location, etc. Speech recognition of sentences that include these n-grams can be challenging, as they are often not well represented in a language model (LM) or even include out-of-vocabulary (OOV) words. In this paper, we propose a solution for using contextual information to improve speech recognition accuracy. We utilize an on-the-fly rescoring mechanism to adjust the LM weights of a small set of n-grams relevant to the particular context during speech decoding. Our solution handles out of vocabulary words. It also addresses efficient combination of multiple sources of context and it even allows biasing class based language models. We show significant speech recognition accuracy improvements on several datasets, using various types of contexts, without negatively impacting the overall system. The improvements are obtained in both offline and live experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Two-step correction of speech recognition errors based on n-gram and long contextual information

This paper presents a fully automatic word error correction on a confusion network that makes use of long contextual information. However, a problem with long contextual information is that improvement of the recognition accuracy is minimal because of the word errors surrounding words. In this paper, recognition errors are first reduced by error correction using N gram features. After that, the...

متن کامل

Developing a Standardized Medical Speech Recognition Database for Reconstructive Hand Surgery

Fast and holistic access to the patients’ clinical record is a major requirement of modern medical decision support systems (DSS). While electronic health records (EHRs) have replaced the traditional paper-based records in most healthcare organization, the data entry into these systems remains largely manual. Speech recognition technology promises substitution of the more convenient speech-base...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Frame-by-frame language identification in short utterances using deep neural networks

This work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in rea...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Bringing contextual information to google speech recognition

نویسندگان

چکیده

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Two-step correction of speech recognition errors based on n-gram and long contextual information

Developing a Standardized Medical Speech Recognition Database for Reconstructive Hand Surgery

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Frame-by-frame language identification in short utterances using deep neural networks

عنوان ژورنال:

اشتراک گذاری